Search CORE

Representation of probabilistic scientific knowledge

Author: De Grave K
King RD
Rzhetsky A
Soldatova LN
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

This article is available through the Brunel Open Access Publishing Fund. Copyright © 2013 Soldatova et al; licensee BioMed Central Ltd.The theory of probability is widely used in biomedical research for data analysis and modelling. In previous work the probabilities of the research hypotheses have been recorded as experimental metadata. The ontology HELO is designed to support probabilistic reasoning, and provides semantic descriptors for reporting on research that involves operations with probabilities. HELO explicitly links research statements such as hypotheses, models, laws, conclusions, etc. to the associated probabilities of these statements being true. HELO enables the explicit semantic representation and accurate recording of probabilities in hypotheses, as well as the inference methods used to generate and update those hypotheses. We demonstrate the utility of HELO on three worked examples: changes in the probability of the hypothesis that sirtuins regulate human life span; changes in the probability of hypotheses about gene functions in the S. cerevisiae aromatic amino acid pathway; and the use of active learning in drug design (quantitative structure activity relation learning), where a strategy for the selection of compounds with the highest probability of improving on the best known compound was used. HELO is open source and available at https://github.com/larisa-soldatova/HELO.This work was partially supported by grant BB/F008228/1 from the UK Biotechnology & Biological Sciences Research Council, from the European Commission under the FP7 Collaborative Programme, UNICELLSYS, KU Leuven GOA/08/008 and ERC Starting Grant 240186

Lirias

Maastricht University Research Portal

Selected papers from the 16th Annual Bio-Ontologies Special Interest Group Meeting

Author: Dumontier M
Rocco-Serra P
Shah NH
Soldatova LN
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Copyright @ 2014 Soldatova et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.Over the 16 years, the Bio-Ontologies SIG at ISMB has provided a forum for vibrant discussions of the latest and most innovative advances in the research area of bio-ontologies, its applications to biomedicine and more generally in the organisation, sharing and re-use of knowledge in biomedicine and the life sciences. The six papers selected for this supplement span a wide range of topics including: ontology-based data integration, ontology-based annotation of scientific literature, ontology and data model development, representation of scientific results and gene candidate prediction

Automating sciences: Philosophical and social dimensions

Author: King RD
Mellingwood C
Schuler Costa V
Soldatova LN
Publication venue: IEEE Technology and Society Magazine
Publication date: 01/03/2018
Field of study

The University of Manchester - Institutional Repository

Apollo (Cambridge)

EXACT2: the semantics of biomedical protocols

Author: A Maccagnan
A Pease
A Sackmann
A Sujathaa
Brian B Rudkin
CJ Mungall
Daniel Nadis
Doi
Emma Haddi
Grunwald
H Obokata
I Mura
J Taubert
K Wolstencroft
Larisa N Soldatova
LN Soldatova
LN Soldatova
LN Soldatova
M Courtot
M Hilario
M Schilling
Nigel J Saunders
Piyali S Basu
R Garside
RD King
Ross D King
RR Brinkman
S Mitchell
S Rune
S Shapin
T Bittner
T Klingström
Th Paul
V Rätzel
Véronique Baumlé
W Ceusters
Wolfgang Marwan
Z Xiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

© 2014 Soldatova et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.This article has been made available through the Brunel Open Access Publishing Fund.Background: The reliability and reproducibility of experimental procedures is a cornerstone of scientific practice. There is a pressing technological need for the better representation of biomedical protocols to enable other agents (human or machine) to better reproduce results. A framework that ensures that all information required for the replication of experimental protocols is essential to achieve reproducibility. Methods: We have developed the ontology EXACT2 (EXperimental ACTions) that is designed to capture the full semantics of biomedical protocols required for their reproducibility. To construct EXACT2 we manually inspected hundreds of published and commercial biomedical protocols from several areas of biomedicine. After establishing a clear pattern for extracting the required information we utilized text-mining tools to translate the protocols into a machine amenable format. We have verified the utility of EXACT2 through the successful processing of previously ‘unseen’ (not used for the construction of EXACT2) protocols. Results: The paper reports on a fundamentally new version EXACT2 that supports the semantically-defined representation of biomedical protocols. The ability of EXACT2 to capture the semantics of biomedical procedures was verified through a text mining use case. In this EXACT2 is used as a reference model for text mining tools to identify terms pertinent to experimental actions, and their properties, in biomedical protocols expressed in natural language. An EXACT2-based framework for the translation of biomedical protocols to a machine amenable format is proposed. Conclusions: The EXACT2 ontology is sufficient to record, in a machine processable form, the essential information about biomedical protocols. EXACT2 defines explicit semantics of experimental actions, and can be used by various computer applications. It can serve as a reference model for for the translation of biomedical protocols in natural language into a semantically-defined format.This work has been partially funded by the Brunel University BRIEF award and a grant from Occams Resources

Goldsmiths Research Online

Ontology of core data mining entities

Author: A Bernstein
A Golbraikh
A Karalic
B Smith
B Smith
B Smith
C Silla
C Vens
D Demšar
D Kocev
D Kocev
D Qi
D Young
DJ Hand
F Serban
G Madjarov
G Tsoumakas
GH Bakir
H Mannila
HP Kriegel
I Slavkov
J Vanschoren
K Button
Larisa Soldatova
LN Soldatova
M Courtot
M Ford
M Žáková
MA Avery
MA Avery
MF López
O Spjuth
P Robinson
Panče Panov
Q Yang
R Caruana
R Guha
R Guha
RD King
RD King
RR Brinkman
Sašo Džeroski
T Dietterich
V Podpečan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/07/2014
Field of study

In this article, we present OntoDM-core, an ontology of core data mining entities. OntoDM-core defines themost essential datamining entities in a three-layered ontological structure comprising of a specification, an implementation and an application layer. It provides a representational framework for the description of mining structured data, and in addition provides taxonomies of datasets, data mining tasks, generalizations, data mining algorithms and constraints, based on the type of data. OntoDM-core is designed to support a wide range of applications/use cases, such as semantic annotation of data mining algorithms, datasets and results; annotation of QSAR studies in the context of drug discovery investigations; and disambiguation of terms in text mining. The ontology has been thoroughly assessed following the practices in ontology engineering, is fully interoperable with many domain resources and is easy to extend

Exploring forest structural complexity by multi-scale segmentation of VHR imagery

Author: A Baranova
A Brazma
A Gomez-Perez
AJ Milsted
B Latour
C Abe
CF Taylor
G Hughes
H Rijgersberg
H Rijgersberg
J Downing
J-R Park
JG Frey
JG Frey
L Soldatova
LN Soldatova
M Agosti
MB Jones
NA Vasilevsky
NJJP Koenderink
O Flórez-Vargas
R Brinkman
RJ Robertson
SJ Coles
Publication venue: Elsevier
Publication date: 01/01/2008
Field of study

Forests are complex ecological systems, characterised by multiple-scale structural and dynamical patterns which are not inferable from a system description that spans only a narrow window of resolution; this makes their investigation a difficult task using standard field sampling protocols. We segment a QuickBird image covering a beech forest in an initial stage of old-growthness – showing, accordingly, a good degree of structural complexity – into three segmentation levels. We apply field-based diversity indices of tree size, spacing, species assemblage to quantify structural heterogeneity amongst forest regions delineated by segmentation. The aim of the study is to evaluate, on a statistical basis, the relationships between spectrally delineated image segments and observed spatial heterogeneity in forest structure, including gaps in the outer canopy. Results show that: some 45% of the segments generated at the coarser segmentation scale (level 1) are surrounded by structurally different neighbours; level 2 segments distinguish spatial heterogeneity in forest structure in about 63% of level 1 segments; level 3 image segments detect better canopy gaps, rather than differences in the spatial pattern of the investigated structural indices. Results support also the idea of a mixture of macro and micro structural heterogeneity within the beech forest: large size populations of trees homogeneous for the examined structural indices at the coarser segmentation level, when analysed at a finer scale, are internally heterogeneous; and vice versa. Findings from this study demonstrate that multiresolution segmentation is able to delineate scale-dependent patterns of forest structural heterogeneity, even in an initial stage of old-growth structural differentiation. This tool has therefore a potential to improve the sampling design of field surveys aimed at characterizing forest structural complexity across multiple spatio-temporal scales.L'articolo è disponibile sul sito dell'editore www.sciencedirect.co

Goldsmiths Research Online

VU Research Portal

Unitus DSpace

Closed-loop cycles of experiment design, execution, and learning accelerate systems biology model development in yeast

Author: Bouthinon D
Carpenter M
Coutant A
Elati M
Grzebyta J
King RD
Ramon J
Roper K
Rouveirol C
Santini G
Soldano H
Soldatova LN
Trejo-Banos D
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2019
Field of study

This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1900548116/-/DCSupplemental.Copyright © 2019 The Author(s). One of the most challenging tasks in modern science is the development of systems biology models: Existing models are often very complex but generally have low predictive performance. The construction of high-fidelity models will require hundreds/thousands of cycles of model improvement, yet few current systems biology research studies complete even a single cycle. We combined multiple software tools with integrated laboratory robotics to execute three cycles of model improvement of the prototypical eukaryotic cellular transformation, the yeast (Saccharomyces cerevisiae) diauxic shift. In the first cycle, a model outperforming the best previous diauxic shift model was developed using bioinformatic and systems biology tools. In the second cycle, the model was further improved using automatically planned experiments. In the third cycle, hypothesis-led experiments improved the model to a greater extent than achieved using high-throughput experiments. All of the experiments were formalized and communicated to a cloud laboratory automation system (Eve) for automatic execution, and the results stored on the semantic web for reuse. The final model adds a substantial amount of knowledge about the yeast diauxic shift: 92 genes (+45%), and 1,048 interactions (+147%). This knowledge is also relevant to understanding cancer, the immune system, and aging. We conclude that systems biology software tools can be combined and integrated with laboratory robots in closed-loop cycles.HIST-ERA AdaLab project: The Engineering and Physical Sciences Research Council (EPSRC), UK(EP/M015661/1) ANR-14-CHR2-0001-01

HAL Evry

INRIA a CCSD electronic archive server

HAL Descartes

The University of Manchester - Institutional Repository

HAL-CEA

HAL-Paris 13

Hal-Diderot

MPHASYS: a mouse phenotype analysis system

Author: CL Smith
D Naf
EJ Baker
H Ikeda
H Masuya
Harry van Steeg
I Saira Mian
Jan Vijg
K Paigen
KA Ching
LN Soldatova
M Ashburner
MD Waters
NF Noy
Paul HM Lohman
R Brent Calder
RR Maronpot
Rudolf B Beems
S Philippi
SWP Wijnhoven
T Clark
TF Hayamizu
Publication venue: BioMed Central
Publication date: 01/06/2007
Field of study

Abstract Background Systematic, high-throughput studies of mouse phenotypes have been hampered by the inability to analyze individual animal data from a multitude of sources in an integrated manner. Studies generally make comparisons at the level of genotype or treatment thereby excluding associations that may be subtle or involve compound phenotypes. Additionally, the lack of integrated, standardized ontologies and methodologies for data exchange has inhibited scientific collaboration and discovery. Results Here we introduce a Mouse Phenotype Analysis System (MPHASYS), a platform for integrating data generated by studies of mouse models of human biology and disease such as aging and cancer. This computational platform is designed to provide a standardized methodology for working with animal data; a framework for data entry, analysis and sharing; and ontologies and methodologies for ensuring accurate data capture. We describe the tools that currently comprise MPHASYS, primarily ones related to mouse pathology, and outline its use in a study of individual animal-specific patterns of multiple pathology in mice harboring a specific germline mutation in the DNA repair and transcription-specific gene Xpd. Conclusion MPHASYS is a system for analyzing multiple data types from individual animals. It provides a framework for developing data analysis applications, and tools for collecting and distributing high-quality data. The software is platform independent and freely available under an open-source license <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>.</p

Directory of Open Access Journals

UCL Discovery

RDFScape: Semantic Web meets Systems Biology

Author: A Bairoch
A Ruttenberg
Andrea Splendiani
B McBride
B Smith
E Sirin
EK Neumann
I Vastrik
JS Luciano
JS Luciano
K Wolstencroft
LN Soldatova
M Ashburner
M Kaneisha
M Krummenacker
MA Storey
O Garcia
P Karp
P Khatri
P Shannon
PD Karp
S Schulz
T Berners-Lee
U Leser
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The recent availability of high-throughput data in molecular biology has increased the need for a formal representation of this knowledge domain. New ontologies are being developed to formalize knowledge, e.g. about the functions of proteins. As the Semantic Web is being introduced into the Life Sciences, the basis for a distributed knowledge-base that can foster biological data analysis is laid. However, there still is a dichotomy, in tools and methodologies, between the use of ontologies in biological investigation, that is, in relation to experimental observations, and their use as a knowledge-base. Results RDFScape is a plugin that has been developed to extend a software oriented to biological analysis with support for reasoning on ontologies in the semantic web framework. We show with this plugin how the use of ontological knowledge in biological analysis can be extended through the use of inference. In particular, we present two examples relative to ontologies representing biological pathways: we demonstrate how these can be abstracted and visualized as interaction networks, and how reasoning on causal dependencies within elements of pathways can be implemented. Conclusions The use of ontologies for the interpretation of high-throughput biological data can be improved through the use of inference. This allows the use of ontologies not only as annotations, but as a knowledge-base from which new information relevant for specific analysis can be derived.</p

Directory of Open Access Journals